adversarial combinatorial bandit
Combinatorial Bandits Revisited Richard Combes
This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the stochastic setting under semi-bandit feedback, we derive a problem-specific regret lower bound, and discuss its scaling with the dimension of the decision space. We propose ESCB, an algorithm that efficiently exploits the structure of the problem and provide a finite-time analysis of its regret. ESCB has better performance guarantees than existing algorithms, and significantly outperforms these algorithms in practice.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Information Technology > Data Science > Data Mining > Big Data (0.90)
- Information Technology > Artificial Intelligence > Machine Learning (0.69)
Adversarial Combinatorial Bandits with General Non-linear Reward Functions
Chen, Xi, Han, Yanjun, Wang, Yining
In this paper we study the adversarial combinatorial bandit with a known non-linear reward function, extending existing work on adversarial linear combinatorial bandit. {The adversarial combinatorial bandit with general non-linear reward is an important open problem in bandit literature, and it is still unclear whether there is a significant gap from the case of linear reward, stochastic bandit, or semi-bandit feedback.} We show that, with $N$ arms and subsets of $K$ arms being chosen at each of $T$ time periods, the minimax optimal regret is $\widetilde\Theta_{d}(\sqrt{N^d T})$ if the reward function is a $d$-degree polynomial with $d< K$, and $\Theta_K(\sqrt{N^K T})$ if the reward function is not a low-degree polynomial. {Both bounds are significantly different from the bound $O(\sqrt{\mathrm{poly}(N,K)T})$ for the linear case, which suggests that there is a fundamental gap between the linear and non-linear reward structures.} Our result also finds applications to adversarial assortment optimization problem in online recommendation. We show that in the worst-case of adversarial assortment problem, the optimal algorithm must treat each individual $\binom{N}{K}$ assortment as independent.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Florida > Alachua County > Gainesville (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.34)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)
Combinatorial Bandits Revisited
Combes, Richard, Shahi, Mohammad Sadegh Talebi Mazraeh, Proutiere, Alexandre, lelarge, marc
This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the stochastic setting under semi-bandit feedback, we derive a problem-specific regret lower bound, and discuss its scaling with the dimension of the decision space. We propose ESCB, an algorithm that efficiently exploits the structure of the problem and provide a finite-time analysis of its regret. ESCB has better performance guarantees than existing algorithms, and significantly outperforms these algorithms in practice. In the adversarial setting under bandit feedback, we propose CombEXP, an algorithm with the same regret scaling as state-of-the-art algorithms, but with lower computational complexity for some combinatorial problems.
- Information Technology > Data Science > Data Mining > Big Data (0.90)
- Information Technology > Artificial Intelligence > Machine Learning (0.69)